47 research outputs found

    LAD-LASSO: SIMULATION STUDY OF ROBUST REGRESSION IN HIGH DIMENSIONAL DATA

    Get PDF
    The common issues in regression, there are a lot of cases in the condition number of predictor variables more than number of observations ( ) called high dimensional data. The classical problem always lies in this case, that is multicolinearity. It would be worse when the datasets subject to heavy-tailed errors or outliers that may appear in the responses and/or the predictors. As this reason, Wang et al in 2007 developed combined methods from Least Absolute Deviation (LAD) regression that is useful for robust regression, and also LASSO that is popular choice for shrinkage estimation and variable selection, becoming LAD-LASSO. Extensive simulation studies demonstrate satisfactory using LAD-LASSO in high dimensional datasets that lies outliers better than using LASSO.Keywords: high dimensional data, LAD-LASSO, robust regressio

    SMALL AREA ESTIMATION FOR ESTIMATING THE NUMBER OF INFANT MORTALITY USING MIXED EFFECTS ZERO INFLATED POISSON MODEL

    Get PDF
    Demographic and Health Survey Indonesia (DHSI) is a national designed survey to provide information regarding birth rate, mortality rate, family planning and health. DHSI was conducted by BPS in cooperation with National Population and Family Planning Institution (BKKBN), Indonesia Ministry of Health (KEMENKES) and USAID. Based on the publication of DHSI 2012, the infant mortality rate for a period of five years before survey conducted is 32 for 1000 birth lives. In this paper, Small Area Estimation (SAE) is used to estimate the number of infant mortality in districts of West Java. SAE is a special model of Generalized Linear Mixed Models (GLMM). In this case, the incidence of infant mortality is a Poisson distribution which has equdispersion assumption. The methods to handle overdispersion are binomial negative and quasi-likelihood model. Based on the analysis results, quasi-likelihood model is the best model to overcome overdispersion problem. However, after checking the residual assumptions, still resulted that residuals of model formed two normal distributions. So as to resolve the issue used Mixed Effect Zero Inflated Poisson (ZIP) Model. The basic model of the small area estimation used basic area level model. Mean square error (MSE) which based on bootstrap method is used to measure the accuracy of small area estimates.Keywords : SAE, GLMM, Mixed Effect ZIP Model, Bootstra

    SUBDISTRICT CLUSTERING IN WEST JAVA PROVINCE BASED ON DISEASE INCIDENCE OF JKN PARTICIPANTS PRIMARY SERVICES

    Get PDF
    One of the efforts that can be done to optimize health services and the distribution of facilities and infrastructure efficiently in a wide scope is by profiling and clustering areas in the province of West Java to the scope of sub-districts that have similar characteristics of disease category. The methods that will be compared to get the best clustering are hierarchical clustering and ensemble clustering. The data used as the object of research is the BPJS Kesehatan capitation primary service sample data for the 2017-2018 period. Some of the important variables used include: primary disease diagnosis data (ICD-10) of patients at the puskesmas, service time, type of visit, and location of service sub-district. This study uses several evaluation metrics Silhouette coefficient, Dunn index, Davies-Bouldin index, and C-index to determine the optimal number of clusters formed. In addition, descriptive analysis and visualization of the clustering results are also used as considerations in selecting the optimal cluster. Based on the evaluation results, the optimal method is hierarchical clustering with complete linkage. This method produces three clusters: cluster 1 consists of 5 sub-districts that have a high/dominant mean value in almost all disease categories, cluster 2 consists of 26 sub-districts that have a medium mean value, and cluster 3 consists of 589 sub-districts that have a low mean value. Most of the members of clusters 1 and 2 are sub-districts located in the districts/cities around the national capital (DKI Jakarta) and the provincial capital (Bandung) while the members of cluster 3 are mostly sub-districts located in suburban districts/cities or far from the central government

    SURVIVAL ANALYSIS WITH EXTENDED COX MODEL ABOUT DURABILITY DEBTOR EFFORTS ON CREDIT RISK

    Get PDF
    The application of survival analysis on the data of credit motorcycle financing experiencing bad loans after the credit starts early, with sixteen covariates were considered. The model used in survival analysis is the Cox proportional hazard models. Cox models have the assumption that the proportional hazard assumption. Extended Cox models selected to improve cox proportional hazard models when one or more covariates did not meet the assumption of proportional hazards. Extended cox models is an extension of cox models that involve time-dependent variables. Covariates that do not meet the proportional hazards assumption in the Cox models diinteraksikan extended with functions appropriate time, in order to obtain time-dependent covariates. So on the model covariates that are not dependent on time and time dependent covariates. The parameters of these covariates estimated using partial maximum likelihood method. To determine whether the extended Cox model is a suitable model for the data in a particular case, likelihood ratio test was used. The results indicate that extended Cox models with functions time appropriate, provide the best model.Keywords : Credit Risk, Survival Analysis, Cox Proportional Hazard , Extended Cox Mode

    NEGATIVE BINOMIAL REGRESSION METHODS TO ANALYZE FACTORS AFFECTING CHILD MORTALITY RATES IN WEST JAVA

    Get PDF
    Data on the number of child mortality cases are discrete data (count) which are usually analyzed with Poisson regression. The characteristics of the Poisson regression mean and variance must be the same, whereas in fact the count data is often becoming variance greater than the mean, which is often referred to overdispersion. To deal with the problem over dispersion, modelling can be done with Negative Binomial Regression because it does not require the mean value equal to the value of variance. Model Negative Binomial produces Deviance/Degree Freely value of 1.6347 and Pearson Chi-Square of 1.4569. This value goes to 1, its means that overdispersion problem was sloved. Key Word: child mortality rates; Negative Binomial Regression, overdisversio

    PROFILE ANALYSIS OF UNMET NEED FOR FAMILY PLANNING INDICATORS USING SDKI AND SUSENAS DATA.

    Get PDF
    Data requirements for government development programs, present in time series and small area estimation with good accuracy, is necessary to achieve the objectives of the program effectively and efficiently. There is an indicators used by Indonesia government to measure the achievement of the population growth control by family planning. It is namely unmet need for family planning indicators. The indicator is obtained from the Indonesian Demographic and Health Survey (IDHS) conducted in 1987, 1991, 1994, 1997, 2002-2003, 2007 and 2012 by provincial level estimation. Noting the estimation period and the level of estimation, unmet need for family planning availability in annual period by regency/municipality level estimation is necessary for better monitoring the achievement of family planning programs. Alternative fulfillment of the necessarily is unmet need for family planning estimation. In this research, the data estimation is done using fertility and family planning data which available in the National Socioeconomic Survey (Susenas). Susenas is an annual survey with regency/municipality level estimation. The profile analysis results of the data on IDHS and susenas 2012, can statistically prove the similarities of both data. Then based on the tests result, Susenas data conclude to be used for estimate indicator of unmet need for family planning by regency/ municipality level estimation. Key words: Profile Analysis, Family Planning, SDKI, Susenas

    Gini Ratio Prediction by Estimating the Components Based on the Ybarra-Lohr Model Small Area Estimation with Estimated Sampling Variance

    Get PDF
    Gini ratio is one of the tools used to measure income inequality, so it is necessary to know the value of Gini ratio to a smaller regional level such as a subdistrict. According to Badan Pusat Statistik (BPS), the components of the Gini ratio are the average per capita expenditure and the relative frequency of households for each expenditure class in the subdistrict. Per capita expenditure data available through SUSENAS is designed to obtain national statistics down to the district level so that estimates are made for the level of subdistrict expenditure classes. Direct estimation for a small sample can cause significant standard errors therefore Small Area Estimation (SAE) with Logarithm Transformation is used to estimate the average per capita expenditure for each subdistrict expenditure class in Depok City 2020. The Ybarra-Lohr area-level model was used because of the availability of auxiliary data with measurement error. Previously, the sampling variance required for estimating the average per capita expenditure was estimated by comparing several estimation methods. As sampling variance estimation method, probability distribution produces an estimate of the average per capita expenditure with the smallest RRMSE, with a random effect variance and goodness of Ybarra-Lohr model are  = 0.686 and = 0.929. The best result of the average per capita expenditure estimation for each expenditure class is used to obtain Gini ratio for each subdistrict in Depok City 2020

    MODELLING THE AVERAGE SCORES OF NATIONAL EXAMINATION IN WEST JAVA

    Get PDF
    Formal education in Indonesia is commonly divided into stages such as preschool, primary school (SD), Secondary School (SMP-SMA), and universities/colleges. Indonesian government has been taking serious efforts on how to improve the quality of education in Indonesia. The roadmap for continous improvement of education quality can be designed based on the results of National Examination (UN) taken regularly by high school students. This research was aimed at exploring informations on how the scores of UN can be linked with other explanatory variables. A panel data which consists of average scores of UN for all public senior high schools (SMA Negeri) in West Java Provinces during 2011-2013 and other related variables such as total scores of accreditation, regional domestic product, human development index, scores of school’s facilities and its infrastructure, scores of school’s educators, average scores of final school exams, were used in this research. The average scores of UN in this case were dependent on variations between high schools and time periods as well as other explanatory variables in which the effects were either fixed or random. The data of this research was modelled with linear mixed models and using the Generalized Estimating Equation (GEE) approach. Both linear mixed models and GEE have been commonly used to analyse the panel data. This paper showed that the GEE provided a model of better performance than the linear mixed models in explaining the variability of the response variable which was the average scores of UN. The GEE also showed significant correlation between explanatory variables and the response. Key words: fixed effects, GEE, linear mixed model, national examination, random effects
    corecore